* This do file generates the results for Table 3 and
* prepare csv input file ("origin_10per_rand.csv") to compute Table 4

* Input files
* basic76.dta: includes patent number, application date, issue date, and classification
* institution_pat: institution patents (assignee should be an institution)
* invent_location76: patent location using inventor data
*                   (if there are multiple inventors, randomly choose one location)
* citing_cited76: the citation link


* 1. Find US originating patents
* (1) US patent
* (2) institution assignee
* (3) issue date and year in cohort sets(1976,1986,1996,2006)
use basic76, clear
gen year = floor(isd/10000)
keep if year == 1976 | year == 1986 | year == 1996 | year == 2006
* (3) isd year in cohort sets(1976,1986,1996,2006)

mer 1:1 wku using institution_pat
keep if _m == 3
keep wku year
* (2) institution assignee(included in temp2 lists)

mer 1:1 wku using invent_location76
keep if _m == 3
keep if cnt == "US"
drop _m
keep wku year
duplicates drop
* (1) US patent(random location)

sa origin_10per_rand, replace


* 2. Count the number of non-self citations within 10 years

use origin_10per_rand, clear

ren wku cited
mer 1:n cited using citing_cited76
keep if _m == 3
drop _m

ren cited wku
mer n:1 wku using basic76
keep if _m == 3

keep wku year citing nam_assg
ren (wku nam_assg) (cited cited_assg)

ren citing wku
mer n:1 wku using basic76
keep if _m == 3
drop _m

drop if year == 1976 & (apd < 19760000 | apd > 19860000)
drop if year == 1986 & (apd < 19860000 | apd > 19960000)
drop if year == 1996 & (apd < 19960000 | apd > 20060000)
drop if year == 2006 & (apd < 20060000 | apd > 20160000 | isd > 20150600)
* citations within 10-year window

keep cited year wku cited_assg nam_assg
ren (wku nam_assg) (citing citing_assg)

replace cited_assg = subinstr(cited_assg," ","",.)
replace citing_assg = subinstr(citing_assg," ","",.)

replace cited_assg = lower(cited_assg)
replace citing_assg = lower(citing_assg)

recast str12 cited_assg, force
recast str12 citing_assg, force

drop if cited_assg == citing_assg
* Drop self_citiations

gen x = 1
collapse (sum) x, by(cited year)

gsort +year -x +cited

by year: gen y = _n
by year: gen z = _N
gen percent = 100 * y/z
ren x num_of_cit

sa origin_10per_rand, replace

* Table 3 results
drop if year == 1976 & num_of_cit < 8
* upper 12.07%
drop if year == 1986 & num_of_cit < 15
* upper 10.18%
drop if year == 1996 & num_of_cit < 37
* upper 10.14%
drop if year == 2006 & num_of_cit < 25
* upper 10.27%

ren cited wku
export delimited wku using "origin_10per_rand.csv", delimiter("*") replace







